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j Data  Techniques. 
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SECTION  I 

Work  Related  to  Professor  Bose 


(a)  Enumeration  of(N  + l,  ktl)  Linear  Codes  of  Minimum  Weight  ^ it  > 2 

A k-flat  in  PG  (N,s)  is  said  to  be  of  minimum  weight  > w if  every 

point  of  the  k-flat  has  at  least  w non-zero  coordinates. 

Then  to  each  k-flat  in  PG  (N,s)  with  minimum  weight  ^m  + 2 there 

corresponds  a linear  (N  + 1,  k + 1)  code  with  weight  m + 2.  Let  F (N,k,s) 

m 

denote  the  number  of  k-flats  of  the  above  type.  This  function  then 

enumerates  these  codes.  In  particular  a necessary  and  sufficient  condition 

for  the  existence  of  at  least  one  (N  +1,  k + 1)  linear  code  of  minimum 

weight  m + 2 is  that  F (N,k,s)  > o.  The  enumeration  of  this  function 

m 

is  therefore  an  important  problem.  A complete  solution  of  this  problem 
would  also  supply  an  answer  to  the  packing  problem  which,  besides  being  of 
importance  for  coding  theory,  has  applications  in  factorial  designs  and 
information  retrieval  [1]. 

Work  on  this  problem  was  started  in  collaboration  with  Gene  Berg  [2]  and 
has  been  continued  in  collaboration  with  Linda  Rollins  and  is  still  con- 
tinuing. The  results  obtained  so  far  are  summarized  below: 


For  0 £ i^  N,  let  denote  the  point  in  PG(N,s)  whose  vector 

consists  of  all  zeroes  except  for  a 1 in  the  (i  + 1)  - th  coordinate. 

The  fundamental  simplex  A is  defined  as  the  simplex  formed  by  the  points 
Xq,  X^,  ...  , Xj^.  The  i-flat  spanned  by  any  i + 1 of  the  vertices  X^ , X^,  ..., 
is  called  an  i-cell  of  A.  Then  F (N,k,s)  is  the  number  of  k-flats  which 


m 

intersect  no  m-cell  of  A. 

Let  H = {Nq,  X^,  ...  , An  n-partition  of  H is  a family 

A = {A^,  A„,  ...  , A } of  subsets  A.  of  H such  that  |a.|  ^n,  UA.  - H, 
and  every  n element  subset  of  H is  contained  in  a unique  membei-  of  A. 


1 

A one-partition  is  an  ordinary  partition.  If  A and  8 are  n-partitions  of 
H we  say  that  A < 8 if  and  only  if  j 0 | ^ n -»•  A.*’B^.  With  this 

ordering  the  set  of  all  n-partitions  of  H is  a lattice  we  denote  by  In 

particular  is  the  lattice  of  partitions  of  H ordered  by  refinement. 

The  following  results  have  been  obtained: 

Theorem  1 . If  T is  a k-flat  in  PG(N,s)  which  intersects  no  m-cell 
of  A , then  there  exists  an  (m  + l)-partition  A = {A^^,  A^,  ...  , A^}  of 

H such  that  T intersects  A in  every  (m  + l)-cell  of  A in  A^  = (i  = 1,2,..., a) 

and  in  no  other  m-cell  of  A. 

for  A = {A  , A.,  ...  , A } cL  we  define  M(A)  = the  number  of  k flats 
12  am 

in  PG(N,s)  which  intersect  no  (m  - l)-cell  of  A but  do  intersect  every 
m-cell  of  A in  A.  {i  = 1,  2,  ...  , a}  and  perhaps  other  m-cells  of  A . 

Similarlv  N(A)  = number  of  k flats  in  PG(N,s)  which  intersect  no  (m  - 1)  - cell 

of  A but  do  intersect  every  m-cell  in  A in  {i  = 1,2,..., a}  and  in  no  othei' 
m-celis  of  A . 

Theorem  2 . (N,  k,s)  = N {(t>)  - ^ M(B)  M(if,6),  where  M is  tiie 

6cL 

m 

Mobius  function  of  L (in  the  sense  of  Kota's  theory  of  generalized 

m 

N+i 

Mobius  functions)  and  it  is  the  m-partition  of  H consisting  of  all  ( ) 


m-subsets  of  ti. 

k + 1 j 

Theorem  3.  F (N,  k,  s)  - (-1)  ( t ) $ (N  - i , k - i , s ) , where 

° 1 I ^ 

I (N,  m,  s)  denotes  as  usu.il  the  number  of  m-flats  in  PG(N,s). 

Theorem  4 . If  we  set  " ^ 0^''  - 1 , M - 1 - 1\  s ) , where  s 

i:  .1  t ixe.i  prime  power,  then  the  function  obeys  the  recurrence  relation 

■ (':,!))  = (M  - 1,  D - 1)  t - 1)  r.,(M  - i,  d). 

0 0 0 

Theorem  '■ . Let  (x)  = ' q (M,n)  x'' ; tiien 

M W - I V « I 

F,"  (x)  = Cx  - I)  ‘ (x)  ♦ (-x). 

0 0 0 


Theorem  6 . Let 


f(M,a)  = I 


V V V 

P(M,a)  1 ^ 2 ^ ■ M 


^2=  •••  '^a- 


where  the  summation  is  over  all  partitions  of  M into  'a'  parts;  then 
f(M,a)  obeys  the  recurrence  relation 


f(M,a)  = (M  - 1)  f (M  - 1,  a)  + f(M  - 1,  a - 1). 


Theorem  7 . 


M f a-D 

r (M  - 1,  M - 1 - D,  S)  . J (-1)”  ^ (S-I)"  ^ f(M,a)  J (-1)'' 

a=0  . i=0 


<!>  (a-l-i,  D-1,  s) 


Theorem  8 . If  we  set  G^(M,D)  = (M  - 1,  M - 1 - D,  s),  then  the 
function  G^(M,D)  obeys  the  recurrence  relation 


G^(M,  D)  = G^(M  - 1,  D - 1)  + (s  - 1)  - (m  - 1)  (s  - 1)  G^(M  - 1,  D), 


Theorem  9 


. If  g”  (x)  = I G^  (M,D)  , then 


M r M-l  M-l 

G‘^  (x)  = X - (M  - 1)  (s  - 1)  - 1 G^  (x)  + g"  (sx), 


These  theorems  give  the  basic  properties  of  the  function  F (N,  k,  s) 

m 

for  the  cases  m = 0 and  1,  and  allow  us  to  enumerate  it  for  any  value  of 
the  parameters  N,  k,  s ^y  using  a very  simple  computer  program. 

A paper  embodying  these  results  is  being  prepared.  Work  on  the  case 


m = 2 has  been  started  and  some  preliminary  results  obtained.  It  is  proposed 
to  continue  this  work. 
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C!  ) SubJesigns  oi  a 


'liven  any  design  D,  whethei’  factoi’idl  or  non-factorial  with  a given 
structure,  a subset  of  its  treatments  and  a subset  of  its  blocks  may  form 
ai'.other  design.  It  is  of  importance  to  study  designs  which  have  a large 
r.umber  of  subdesigna.  A subdesign  will  load  to  an  independent  estimate  of 
ei'ro''.  Tlie  study  of  subdesigns  was  started  under  a previous  Aii’  force 
contract  and  continued  under  the  present  contract.  The  paper  on  Baer  Sul'designs 
of  Svmimetric  Balanced  Incomplete  Designs  has  now  appeared  in  "Essays  in 
frobabilitv  and  Statistics"  in  honor  of  J.  Ogawa  [3]. 

Fui’ther  work  cn  subdesigns  has  been  continued  and  subdesigns  of  s>Tnmetric 
group  divisible  designs  with  the  dual  property  have  been  studied,  and  a 
r.umber  of  theorems  obtained.  In  particular  some  applications  liave  been  made 
to  s\mimetric  near  planes.  This  paper  [i+j  has  been  accepted  for  p'ax-iication  in 
the  Journal  of  Stati;;tical  Planning  and  Inference. 

(c)  Early  History  of  Multivariate  Analysis 

A compreiionsive  paper  [5J  giving  the  early  history  of  multivariate  analysis, 
up  to  was  pi’epared  and  formed  the  sul^ject  matter  of  the  inaugui’al 

address  at  the  4th  International  Symposium  on  Multivariate  .Analysis  held 
at  Davton,  June  197b. 


SECTION  II 

Work  Related  to  Professor  Srivastava 


Most  of  the  wor’k  undei'  Part  B was  done  by  Professor  Srivastava. 

For  part  of  the  work,  he  had  one  Ph.D.  student,  D.  H.  Mallenby,  who  at  the 

title  of  this  writing,  is  finishing  up  his  Ph.D.  dissertation.  This 

i 

Ph.D.  dissertation  was  entirely  supported  by  this  Air  Force  contract.  ’ 

The  work  in  this  section  will  be  summarized  under  six  different 
headings.  These  correspond  to  the  six  different  topics  '■n  whic'n  significant 
work  was  performed.  These  topics  are  as  follows: 

1.  Inference  in  Search  Linear  Models. 

2.  Some  Studies  on  Missing  Data  Techniques. 

3.  Some  Studies  on  Designs  for  Factor  Screening. 

4.  Some  Studies  on  Optimal  Factorial  Designs. 

5.  Application  of  Search  Linear  Models  to  the  Diagnosis  of  Patients. 

6.  Application  of  Search  Linear  Models  to  Industrial  Psychology. 

In  the  succeeding  sections  2. 1-2. 6,  we  describe  the  work  done  on  these 
topics . 

Inference  in  Search  Linear  Models.  In  this  section  we  shall 
summarize  some  of  the  main  results  obtained,  which  are  contained  in  the 

Ph.D.  dissertation  [6]  of  Mallenby,  Chapters  1-4.  These  results  will  he  pul-'lished 
in  the  form  of  joint  papers  by  Srivastava  and  Mallenby.  In  ordei-  to  explain 
the  results , we  shall  have  to  recall  some  results  from  earlier  papers  of 
Srivastava  [7,8].  After  this,  we  shall  briefly  summarize  some  of 
the  results  reported  in  the  thesis. 

Consider  the  following  general  linear  model: 

(1.1a)  y = A^^^  + A^^j  + e_, 

(1.1b)  Exp(£)  = 0^,  Var(e^)  = 

5 


where  is  a vector  of  observations,  ^(Nxl)  is  the  error  vector; 

A^(Nxv^),  A.^lNxv,^)  are  known  matrices,  (v^xi)  is  a vector  of  fixed  unknown 
parameters,  and  o'  is  x knov;n  or  unknown  constant.  About  £„(v_xi),  partial 

— z 1 

information  is  available . 

The  vector  oonsists  of  fixed  parameters  whose  value  is  unknown. 

However,  it  is  given  tliat  the  elements  of  are  all  negligible  except  1 

for  a set  of  k elements,  where  k is  a known  positive  integer;  however, 

it  is  not  knowT;  which  particular  subset  of  k elements  of  is  non-negligible . 

I ' 

In  actual  applications  of  this  model,  k would  usually  be  much  smaller  than 

V.,.  This  is  called  the  search  linear  model  with  fixed  effects.  f 

With  this  model,  the  inference  problem  is  to  search  out  the  k (possibly) 
non-negligible  elements  of  to  make  inferences  on  these  elements  of 

and  also  on  the  elements  of  . The  design  problem  is  to  determine  j 

the  nature  of  observations  y_  (and  hence  the  matrices  A^  and  A.,),  so  that 

the  search  and  inference  problems  can  be  handled  efficiently.  ' 

Notice  that  in  the  search  linear  model  (1.1a),  there  is  an  extra  tei-m  ; 

A,^  . This  term  is  not  present  in  ordinary  general  linear  models,  since 
in  ordinary  linear  models  the  concept  of  'search'  is  missing.  This  shows 

that  we  should  expect  the  search  model  to  fit  real  life  situations  hettei'  j 

I 

than  ordinary  linear  models.  Indeed,  very  often,  in  real  life  situations  i 

where  one  attempts  to  fit  a linear  model,  one  comes  to  a point  where  one 
feels  that  he  has  included  all  parameters  in  his  model  which  he  could  lay 

I, ; 

his  linger  on.  At  the  same  time,  he  may  feel  that  the  model  iie  has  ' 

hypothesized  may  not  fit  well  enough,  since  he  knows  from  his  experience  :■ 

that  th.ere  must  be  a few  more  parameters  which  are  non-negligible.  However, 
he  cannot  include  these  parameters  in  his  model  since  these  parameters 
couil  be  any  ones  out  of  a large  set  of  parameters,  and  he  does  not  know 
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exactly  which  parameters  he  should  include.  He  may  not  include  all  of  the 
remaining  parameters  because  trying  to  estimate  all  of  them  by  using  the 
ordinary  linear  model,  and  thus  determining  the  non-negligible  parameters, 
would  involve  too  many  observations.  Also,  since  he  knows  that  most  of 
the  parameters  out  of  the  large  set  of  parameters  are  negligible  the 
ordinary  linear  model  is  not  quite  applicable.  Clearly,  the  situation 
is  described  appropriately  by  the  search  linear  model  defined  above. 

The  above  search  model  was  first  considered  by  Srivastava  [7]. 

The  case  when  = 0_  (the  vector  each  of  whose  elements  is  zero)  is  called 
the  noiseless  case.  This  case  is  quite  important  since  any  difficulties 
arising  in  the  noiseless  case  also  remain  present  when  noise  is  imposed. 

We  now  present  some  results  from  Srivastava  [7,8]  for  later 

use . 

Theorem  1.1:  Consider  model  (J .la,  b)  under  the  noiseless  case.  A 

necessary  and  sufficient  condition  that  the  elements  of  1,  mav  be  found 
exactly,  and  the  correct  non-negligible  set  of  parameters  car.  be  searched 
out  of  with  certainty  and  their  exact  values  found,  i;  that  for  every 
(Nx2k)  submatrix  A^q  of  we  have 

(1.2)  Rank(Ar:A''  ) ’ v,  + 2k. 

j.  2C  1 

We  note  that  this  means  we  must  have  at  least  (v^  + 2k)  observations, 
that  is , 

(1.3)  H ^ (v^  + 2k). 

Examples  are  abundant  where  matrices  A^  and  A^  satisfy  conditions  (1.2), 
and  where  N attains  the  lower  bound  in  (1.3). 

The  revelance  of  Theorem  1.1  to  the  noisy  case  (i.e.  > 0)  is  that 


7 


the  rank  condition  (i.2)  ia  still  necessary,  but  it  will,  of  course,  be 


no  longer  sufficient  for  the  inference  problem  to  be  solved.  Tije  model 
(l.la,b)  in  which  satisfy  (1.2)  is  called  the  strongly  resolvable 

search  linear  model  with  fixed  effects.  Also,  if  we  call  T the  design 
corresponding  to  the  observations  then  if  A^,  satisfy  (1.2)  we  say 
T is  a search  design  of  resolving  power  ,^2 ^ ■ Important  classes  of 
such  designs  have  already  been  obtained  (for  example,  by  Srivastava  and 
Ghosh  [9J)  in  the  context  of  factorial  experiments. 

Srivastava  L8J  has  shown  that  the  search  and  estimation  problem 
concerning  can  be  separated  from  the  estimation  problem  for 
Accordingly,  it  is  easier  to  consider  the  model  with  = £;  this  is  called 
tl'.e  i ure  search  linear  model.  Set  = 0,  = v,  = A,  and  ^2  ~ 

(l.la,b)  to  obtain  the  model 

2 

(1.4)  A£  + £,  Exc(£)  = 0,  Var(e.)  = 0 , 

where  it  is  given  that  the  elements  of  f,  are  negligible  except  possibly 
for  a set  of  at  most  k elements  where  k is  a known  positive  integer-;  however, 
the  non-negligible  subset  of  ^ is  not  kno-wn . From  (1.3)  we  !iave  N £ 2k . 

The  rank  condition  becomes 

(1.5)  rank (Jp  ) = 2k 

for  every  (Nx2k)  submatrix  of  A.  There  are  (^)  distinct  sets 

i = 1,2,...,(^),  of  k elements  of  £,  and  the  corresponding  model  is 
K 

• 2 

(1.6)  y_  - A^£^  + e_;  Exp(e_)  = 0_,  Var(e_)  = a 1^^, 

where  A^  is  the  set  of  k columns  from  A corresponding  to  the  elements  of 
. The  search  problem  is  to  select  the  correct  set  of  k non-negligible 
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ettect:'.,  v:r  oquLv..iit-nt_'.',  to  j.ick  thf^  corr'ect  node  iron  the  set  o:  c/j 

k 

modelL>  at  (I.t).  .Notice  ti*at,  in  view  of  (i.5),  eacti  of  these  models  is 
a full  rank  general  linear  model.  Also,  note  that  we  must  take  A with 
columns  normalized  so  that  all  the  parameters  are  given  equal  weighting; 
in  other  words,  we  iiave 

( 1 • 7 ) ■ I ’ • • • ’ where  a|a^  - 1,  i = 1,2,. ..,v. 

Four  methods  of  search  (for  general  k)  were  proposed  by  Srivastava 
[7J.  Of  these,  two  are  based  on  the  selection  of  that  subset  of  k 
parameters  (to  correspond  to  the  possibly  non-zero  set)  which  in  some  sense 
corresponds  to  a "small"  sum  of  squares  due  to  error.  In  the  otiier  two 
methods,  we  consider  estimating  tlie  sum  of  squares  of  the  non-negligible 
parameters.  All  of  these  metiiods,  even  under  tiie  assumption  of  normality, 
lead  to  ratiier  inti'icate  distribution  theorv  problems.  Besides  the  above 
four'  met-iiods,  a fiftii  metiiod  (las  i'een  developed,  wiiere  tiie  idea  is  to 
try  to  dichotomize  tlie  set  of  parameters  into  two  parts  such  that  one  part 
has  the  'large'  parameters,  and  the  other  does  not.  There  is  the  possibility 
that,  in  general,  this  approach  would  have  the  merit  of  requiring  less 
computation  in  the  selection  process.  Below,  we  proceed  to  descril>e  these 
procedures  in  a little  more  detail,  and  explain  the  results  obtained. 

Now  we  shall  describe  in  detail  the  various  metiiods  of  search  that 
have  been  studied.  We  shall  refer  to  the  general  search,  linear  model  with 
two  sets  of  parameters  , and  ^2'  We  shall  assume  that  there  is  some  set 
of  at  most  k parameters,  out  of  the  set  of  which  is  non-zero.  We  sliall 
assume  that  the  value  of  k is  known.  The  methods  then  are  as  follows. 

Method  I.  This  method  consists  of  taking  a given  subset  of  k parameters 
out  of  £2*  considering  the  ordinary  linear  model  with  + k parameters  so 


I 
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obtained.  Corresponding  to  this  general  linear  model  with  + k parameters, 

one  could  calculate  the  sum  of  the  squares  due  to  error.  Corresponding  to 

the  ith  set  of  parameters  (i  = 1,...,(^))  out  of  ^ , we  shall  get  an  error 

2 

sum  of  the  squares,  which  we  denote  by  s..  We  could  calculate  the  value 

2 2 2 
of  s.  for  each  value  of  i.  Let  the  integer  be  such  that  s.  = min  st. 

" ° ^0  i ^ 

Then,  under  Method  I,  the  subset  of  k parameters  in  which  corresponds 

to  the  value  i^  of  i will  be  considered  as  the  non-negligible  set  of  parameters. 

Method  II.  In  this  method  we  attempt  to  obtain  an  estimate  of  the 

sum  of  the  squares  of  the  non-negligible  parameters.  That  subset  of  k 

parameters  out  of  corresponding  to  w!iich  this  estimate  is  the  largest, 

is  then  considered  to  be  the  non-negligible  subset. 

Method  III.  This  is  a variation  of  Method  I.  In  tliis  case,  we  choose, 

arbitrarily,  a number  c,  where  c is  positive.  A good  value  of  c is  not 

known,  but  one  advisai>le  value  might  be  c^,  where  is  a number  such  that 

V 2 

there  are  approximately  u x (,  ) values  of  i such  that  s . < c . We  then 
consider  tlie  set  of  all  parameters  in  f_.,  and  make  a frequency  distribution 
of  these,  noting  the  numljer  of  times  any  such  parameters  occur  in  those 
subsets  of  K parameters  out  of  for  wtiich  .-j  _<  c^.  After  this  frequency 
distribution  is  made,  we  ctioose  the  k parameters  of  correspond inr  to 
which  the  frequency  is  tiie  higiiost.  Various  values  of  u could  I'e  used 

in  practice.  One  advisable  value  may  be  u = 0.1. 

Method  IV.  This  is  a variation  of  Method  II  in  the  same  direction 

as  Method  III  is  for  Metiiod  I. 

Method  V.  Tills  metiiod  is  a bit  too  complex  to  be  stated  here  in  com- 
plete detail.  fiowever,  ttie  main  idea  is  as  follows.  We  .sliall  di.scus.s  tlu' 
idea  with  reference  to  tlie  case  wlien  k = 1 , and  is  a null  set,  so  tiial 
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= 0.  The  matrix  A,^  will  here  he  denoted  simply  hv  A.  h'/  coi;;.  i 'ior  in.' 
tlie  subspaces  of  tlie  column  space  oi  A,  we  divid:  the  set  ot  parameters 
^ (which  here  denotes  for  simplicity),  into  two  part.-,  ^ and 
which  are  mutually  exclusive  and  exhaustive  in  the  sense  that  taken 
together  thev  include  all  the  elements  of  Now,  we-  construct  a quadrat  i'. 
form  y^'Qj^Vj  and  we  choose  a number  such  that  if  turn;;  out  tr  be 

larger  than  , tlien  we  decide  that  the  non-negligible  parametei-  ijcJongr 
to  the  subset  5 and  otherwise  we  decide  that  it  belongs  to 
Similarly,  we  construct  another  quadratic  form  l^'Q.,y,  and  take  another 
constant  ct.,,  such  that  if  v'Q  _y_  turns  out  to  be  larger  than  uj  , tlien  we 
decide  that  the  non-negligiljle  parameter  belongs  to  a subset  of 
.inJ  that  ot.hei’wise  it  belongs  to  the  complimentai'V  subset  £.  Tlie  .subset., 

C and  C,p  are  mutually  exclusive  and  exhaustive.  Now,  consiiler  the  tw. 
quadratic  tbrnis  taken  together.  Suppose  that,  in  the  particular  examp].  , 
the  first  (juadratic  form  indicates  that  the  non-negligible  parameter  belon.'-. 
to  and  the  second  quadratic  form  indicates  that  it  belongs  to 

then  we  decide  th.it  the  non-negl  igil>ie  p.irametei’  lielongs  to  the  intense,  : 
of  the  two  suh.iets,  namely  . Similarly,  we  proceed  with  otiu.r 

quadratic  forms.  We  choose  a positive  integer  I,  and  constr-uct  t .ju.jd:  i’ 
forms.  Foi'  the  ith  case,  the  set  £ is.  divided  into  twc'  .■^^ubset;-. , f . , irid 
f . , , a quadratic  form  v'Q.y  is  constructed  and  a number  a.  is 
chosen  such  that  if  larger  than  then  we  decide  th  it  the  non  ■ 

parameter  l>elongs  to  Aq  ■>  other'wi.'-.e  we  deci  ie  it  beliuigs  to  f,  . Tliu:  , 
for  e.ach  i (i  = 1,...,?.)  wo  decide  whetiier  tlie  non-neg  1 ig  i bl  e rairnett.-i  t 
to  or  f,.-,.  finally,  t, iking,  all  these  .lua^hMtic  t.u'ms  to,e,st:.i'r , we  . ' ]• 

that  the  non-neg, ).  ig,ibie  parameter  belongs  to 

I , 1 

values  1 and  1.  Tlie  number  f is.  cb.oscn  in  such  a w.iv  ttiat  tlu  . ••  int'-r’.-t-.  i !■,  . 

1 1 


I subset?  ill  contain  at  most  one  parameter.  Finally,  decision  rules  are 
constructed  to  arrive  at  the  non-negligible  parameter,  starting  from  this 
point . 

In  tlie  first  four  chapters  of  the  thesis,  these  methods  and  many 
small  variations  of  these  are  studied.  The  application  of  these  is  also 
studied,  particularly  to  the  case  of  2^^  fractional  factorial  designs.  Hero, 
two  kinds  of  examples  are  considered.  One  is  that  where  is  the  null  set, 
and  contains  all  the  parameters  of  the  2^  factorial.  The  other  is  the 

case  where  corresponds  to  the  general  mean,  the  main  effects,  and  the 

— ± 

two  factor  interactions,  and  corresponds  to  the  remaining  parametei’S. 

Such  designs  are  called  designs  of  resolution  5.k.  Onlv  the  case  k = 1 
is  considered. 

Methods  II  and  IV  are  studied  very  little.  Some  tlieoi’etical  study  ia 
made  for  Method  I,  particularly  in  connection  with  its  applic.ition  to  tlie 
factorial  designs,  where  is  tlie  null  set.  Method  V is  developo.l  quite 
a bit,  and  the  probability  of  correct  search  under  tlii;;  method  is  r.tudied 
theoretically.  How  to  construct  the  quadratic  form?.  Q.  and  the  ;;uly;ots 
^ and  ^<2  is  discussed  in  detail.  The  probat'ility  ol  correct  search 
is  calculated  theoretically  for  the  application  to  the  factorials,  where 
= 0,  (i.e.  ^ is  a null  set). 

A number  of  Monte  Carlo  studies  have  also  been  made  regarding  Metliods 
I and  V and  many  of  their  variations,  particularly  with  reference  to  the 
two  applications  mentioned.  The  probability  of  correct  search  for  Mettiod 
I turns  out  to  be  very  high.  Method  V is  also  not  too  had.  Tlie  chief 
advantage  of  Method  V seems  to  bo  its  potential  application  to  othei' 
cases,  and  its  generalizabil itv . Method  I,  in  general,  turns  out  to  be 
theoretically  very  complex,  so  far  as  the  probability  of  correct 
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seai'ch  i;:  conn iderod . Alno,  hi  gcner.i]  , tho  calculation  ol  (j^')  :-.umn  o'. 
nquaros  due  to  error,  particularly  when  v in.  larj’e,  and  k is  even 
moderate,  seems  to  be  a large  volume  of  computation.  On  the  other  hand. 
Method  V needs  just  a few  quantities  and,  therefore,  it  seems  moi’e 
attractive.  Howev’ei’,  as  it  stands  now,  the  question  of  determination 
of  the  a's  is  not  quite  solved,  although  one  variation  of  Method  V where 
01 's  are  done  awav  with  seems  to  be  promising. 

The  most  heartening  result  seems  to  be  the  following.  The  probability 
of  correct  search,  as  shown  by  Monte  Carlo  studies,  for  tho  case  of  tho 
factorials  where  is  not  a null  set,  turns  out  to  be  very  high.  This 
shows  that  the  deisgns  obtained  by  Srivastava  and  Giiosli  ai’e  exceedingly 
good . 

(.b)  Missing  Data 

In  Chapter  V of  the  above  thesis,  some  missing  data  techniques  li.ive 
been  compared  using  Monte  Cai'lo  work.  Also,  for  a particular  case,  some 
theoretical  studies  are  made. 

Ttie  problem  is  this.  Suppose  tliat  we  have  a tri variate  normal 
population.  This  population  has  three  univariate  marginals,  three  biv.iriate 
marginals,  and,  of  course,  one  trivariate  marginal  which  is  the  whole 

3 

population  itself.  Thus,  it  has  2 -1=7  subpopulations  in  it,  where 

it  is  considered  a subpopulation  of  itself.  Now,  suppose  that  s.amples 
are  given  not  only  from  the  trivariate  marginal  but  also  from  many  oi-  all 
of  the  six  subpopulat ions . These  samples  may  be  of  different  .••ices.  Now 
suppose  that  we  want  to  estimate  the  mean  vectoi’  u (3  x 1),  the  correlat ions 
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'■^■>3'  vai'iance;;  ci,",  , ^3'’  hence  also  the  covariances 

of  ttie  trivariato  marginal.  T.ho  Monte  Carlo  studies  siiow  that  for  small 
samples  there  are  a couple  of  metliods  which  seem  to  be  generally  better 
than  all  the  other  methods  studied. 

.■\lso,  the  theoretical  study  mentioned  above  corresponds  to  the  case 
where  all  elements  of  are  equal,  all  the  correlations  are  equal,  and 
also  all  tlie  three  variances  are  equal. 

(c)  Application  of  Search  Linear  Models  to  Reduced-Size  factor 

Screening  Designs 

Below  we  give  a summarv  of  the  paper  on  the  subject  which  has  appeared 
in  the  proceedings  LlOj  of  tiie  Ninth  International  Biometric  Conference,  Boston, 
1976,  pages  139-162. 

The  problem  of  factor  screening  can  be  stated  in  many  ways,  depending 
upon  the  assumptions.  In  this  paper  subsets  of  the  following  assumptions 
are  made . 

(Cld)  Out  of  the  total  of  m factors,  at  most  d factors  are  effective. 

(Clb)  All  factors  have  the  same  prior  probability  p of  t'eing  effective. 

(C2)  The  effective  variables  have  much  greater  effect  than  all  of  t!'.e 
unimportant  variables  combined.  In  other  words,  the  experimental 
error  is  small. 

(C3a)  There  are  no  interactions  among  factors . 

(C3b)  There  are  interactions  among  factors.  However,  if  an  interaction 
involving  sav  r (^2)  factors  is  "large",  then,  any  interaction 
(oi'  "main  effect")  involving  a subset  of  one  or  more  of  these 
r factors  is  also  large . 

(C*0  The  "direction"  of  possible  effects  is  known.  In  other  words. 


we  know  the  "sign"  of  anv  effect.  (Ttiis  assumption  will  be 
often  made  along  witti  (C3a).) 
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Moj't  often  Lhe  subset  of  assumptions  is  one  a.-:  urn;  t ini  ■ u*  f ( I i) 
or  (Cll),  the  assumptions  (C.  ) and  (C4),  and  one  of  tia-  da.,u’:.;  t ; ( ■:!) 

or  (C3b). 

The  problem  is  tiiis:  how  to  conduct  an  experiment  with  minimal  s i z- 

suoii  t.'.at  under  an  appropriate  subset  of  the  above  aG:;u::.pt  ioir; , we  .'ir-  a;  la 
to  search  for  all  the  effective  factors . 

Large  factor  screening  experiments  are  common  in  industi'V  aiid  hi  h g.-,-. 

'■'ne  biological  example  is  the  detection  of  a I'are  attriiute  among  memherf: 
of  a large  population.  A famous  example  is  Wasserman  t\-pe  oi  Mood  tes'_. 

i 

Most  situations  considered  so  far  iiu'olve  <i  lai'ge  number  '.t  fact  i . wit. 
interactions  assumed  completelv  absent.  H-wever,  in  igricullui.il  ,i:;c  ; ! 
work,  one  is  often  concerned  with  expei'iments  where  the  numl  *-r  of  t i..i  i 
is  relativ’elv  small,  and  i ui’thermore , whei’c  ii.ter'act  ions  between  tw  v -r 
factors  mav  be  present.  Tliis  papei',  therefore,  deals  with  both  types 
of  situations. 

The  purpose  of  this  paper  is  manifold.  We  tii’^t  e.-.t  ibli.-d:  tne  . .'nn 
between  the  general  area  of  factoi'  screening  and  the  newlv  .;tart  1 tis!.; 
of  searcl;  linear  models.  This  leads  to  cert<iin  dii-eoti  -ns  ■ t li'Vi  to;:  . 
involving  certain  properties  of  zero-one  matrice;  . The  iroportv  1'^  1 
of  these.  A matrix  is  said  to  liave  pi'oporty  b.j.  't  evei'v  r>'t  ot  t ; 

columns  of  the  matrix  are  linearlv  independent.  Notice  that  t!.e  ii:atrix 
could  be  over  tlie  real  field  or  over  finite  fields.  The  piv'i'ertv  ! , . uv.  ■ 
finite  fields,  was  found  to  be  of  central  imj'oi't  ince  in  the  tlr'U  , ! 

confoun.led  factorial  designs.  hater  on,  it  was  found  t b-'  ‘ ,i.  ii 
importance  also  in  the  theoi'V  of  orthogonal  t ra.'t  iona  1 fa..t  u i.ii  d-  i,-..  , 
and  of  linear  error-correct  long  and  erroi'  detecting  . .d<'  . In  'lii 
we  find  that  we  need  a :;imilar  property  v.-r  the  real  !'•  11.  N i'  . 


connections  between  tfie  real  field  and  tlie  finite  field  GF(2),  it  is  shown 
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that  certain  matrices  which  have  been  studied  earlier  in  tlie  context  of 
factorial  designs  and/or  coding  theory  are  useful  also  for  factor  screening 
experiments.  The  next  step  is  to  consider  the  process  of  factor  screening 
given  the  observations  from  a particular  design.  (In  this  paper  the  word 

JTJ 

"design"  will  mean  merely  a set  of  treatment  combinations  from  a 2' 
factorial.  We  shall  always  assume  we  have  m factors,  each  at  two  levels.) 
rinallv,  multistage  procedures  and  reduced-size  designs  are  considered 
An  important  feature  of  the  area,  namelv,  the  situation  wiiere  observations 
involve  errors  of  variation,  is  not  considered  in  this  paper. 

(d)  Comparison  of  Various  Optimality  Criteria  with  Res;,  ect  to  Balanced 

Optimal  Factorial  Designs  of  the  2 Type. 

This  work  has  not  yet  been  written  up,  but  it  formed  the  basis  of  a 
lecture  given  by  Srivastava  at  the  International  Svmposium  on  Statistics  a.nd 
its  Applications,  held  at  the  Indian  Statistical  Institute,  Calcutta,  during 
December,  1974,  in  honor  of  Professor  Mahalanobis.  In  this  paper,  we  consider 
2^  f/ictorial  designs,  with  4 m £ 8 . For  each  value  of  m,  a whole  range  of 
practical  values  of  the  number  of  runs  N is  considered.  In  various  papers 
of  the  author,  either  alone  or  with  Chopra,  trace  optimal  factorial  designs 
have  been  constructed.  In  this  paper,  we  also  consider  optimality  with 
respect  to  other  criteria,  and  present  a comparison. 

Some  main  features  of  this  work  are  very  briefly  described  by 
Frivdstav  1 Lllii  which  is  largely  a review  paper  contained  in  a volume 
edited  bv  Dr.  Krishnaiah,  entitled  Developments  in  Statistics,  to  be  publi.'died 
bv  the  Academic  Press. 
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(e)  Application  of  Search  Linear  Models  to  the  Diagnosis  of  Patients 


Although  the  title  has  been  given  in  terms  of  diagnosis  of  patients, 

the  statistical  problem  is  general  and  is  applicable  in  various  other 

situations.  An  invited  lecture  on  this  topic  was  presented  in  the  annual 

meeting  of  the  Classification  Society  of  America  at  Rochester  in  May,  1976. 

However,  the  paper  has  not  yet  been  written.  I am  waiting  for  a suitable 

type  of  data  to  illustrate  the  theoretical  ideas,  and  the  paper  will  be 

written  as  soon  as  such  data  becomes  available.  Of  course,  the  proper 

acknowledgement  to  the  present  Air  Force  contract  will  be  made. 

The  statistical  problem  considered  is  as  follows.  It  concerns  the 

classification  of  an  individual  in  terms  of  'inner  conditions,'  when  information 

on  that  individual  is  given  on  'external  variables.'  For  example,  in 

case  of  diagnosis  of  disease  from  outward  symptoms,  the  inner  condition 

would  correspond  to  the  stages  of  disease  complicated  by  one  or  more 

internal  factors  X, ,...,X  . The  external  variables  would  then  correspond 
1 m 

to  the  measurements  on  external  symptoms,  such  as  blood  count,  pulse  rate, 

etc.  Now  there  may  be  initial  data  available,  as  a result  of  previous 

detailed  study,  or  prolonged  experience.  Suppose  data  is  available  for 

N cases,  the  information  for  the  ith  case  being  given  in  terms  of  botii  ^he 

internal  and  external  variables.  Values  of  the  internal  variables  could 

be  (X.,,...,X.  ),  and  the  information  on  the  external  variables  may  be 
1 1 im 

given  as  a vector  (v  , . . . ,v  .^ ) . Usually,  because  of  insufficient  data, 
it  mav  be  that  all  the  different  possible  combinations  of  the  X's  mav 
not  be  available.  In  other  woi^ds , some  inner  conditions  mav  be  more  common 
than  others,  and  information  on  certain  inner  conditions  mav  not  be  ava  ila!!'" . 
dimilarlv,  in  certain  cases  some  of  the  v observations  mav  not  be  available. 

The  appro, »ch  considered  b.ere  is.  bv  considering  a linear  model  whicli  expresse; 
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the  expected  value  of  eacli  v observation  in  terms  of  the  corresponding 
^ observat ions . The  theory  of  search  linear  models  is  then  used  to  estimate 


the  various  coeffecients  accurately.  Finally,  after  a good  model  between 
the  v's  and  X's  becomes  known,  using  the  tneory  of  discriminant  functions, 

including  Mahalancbis  distance,  a method  is  proposed  for  classifying  a new 

A A 

observation  vector  (v,  , . . . ,v  ),  into  one  of  the  various  X vectors.  In 

■ 1 P 

the  talk  given,  the  X vectors  were  restricted  to  the  case  where  each 


= 1 or  -1,  representing  the  presence  or  absence  of  some  internal 
condition.  This  made  possible  the  application  of  the  present  work  done 
on  the  theory  of  2*”  factorial  designs  to  the  present  problem. 


(f)  Application  of  Search  Linear  Models  to  Industrial  Psychology 
I intend  to  write  a paper  on  this  subject  later  on.  However,  this 
idea  developed  during  a visit  to  Lackland  Air  Force  Baso  in  Texas. 

Some  of  the  scientists  there  informed  the  author  about  certain  models  they 
were  trying  to  develop  to  predict  job  difficulty.  The  predictor  variables 
were  eight  variables  denoted  by  X ,...,X  , and  tunctions  of  these. 

For  example,  X^  indicated  the  number  of  tasks  performed,  the  mean 
difficulty  level  of  a task,  from  nine-level  ratings,  X^  the  same  as  X^ 
except  that  it  is  from  seven-level  ratings,  X^  the  task  difficulty  per 
unit  time  spent  from  nine-level  ratings,  X^  the  same  as  X^  from  seven- 
level  ratings,  X^  the  job  difficulty  at  the  average  grade  level,  the 
time  spent  on  selected  tasks,  and  X the  range  of  task  difficulty.  The 

O 

Functions  of  these  considered  were  simple  sums  of  pairs  of  a few  of  these 
or  squares  of  some  of  them  or  products  of  some  of  them.  In  all,  the  scientists 
had  studied  14  functions  which  they  denoted  respectiyely  by  Z^,...,Z^^^. 

The  following  question  arises.  Are  there  other  important  functions  {Z^}? 
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Could  it  be  that  there  are  functions  {Z^}  which  would  serve  better  tnar. 
the  functions  that  they  have  taken?  An  obvious  answer  to  this  is  to  use 
the  theory  of  search  linear  models. 


1' 
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